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(57) ABSTRACT 

A method of determining the motion vector of a block of a 
video frame with respect to a reference video frame. The 
video frame and reference video frame comprising pixels, 
wherein each pixel has a pixel value. The method comprises 
first determining a plurality of sets of error norms, wherein 
each error norm within one of the sets is related to a different 
position of the block in the reference video frame, wherein 
the error norms are calculated by a norm of an error which 
is given by functions of the pixel values of the block of the 
video frame and the reference video frame, and wherein 
each set of the plurality of sets is related to a different 
function. Second, the motion vector is selected with the 
smallest error norm. 

29 Claims, 9 Drawing Sheets 
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Figure 1. Wavelet-based video encoder with temporal prediction in the wavelet 
domain. 




Figure 2. AD and AS as a function of s andJt = l , for the biorthogonal filters (2.4) and 
(5.5). 



1 0,0 


l <x — a 


i — a — a — o^^i 

0.4 
0.3 
0.2 
. 0.1 - 

, 1 i 0 


V — • — AD,Blort.(97) 
O— -A5,Biort.(97) . 



Shifts 1 



Figure 3. AD and AS as a function of s and k = l , for the biorthogonal filters (9.7). 
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Figure 4. "Mobile & Calendar" sequence, converted to 256x256 format 




Figure 5. The FS-AS/AD (2 t 4 t 8) method attains a minimum with the AS 
criterion (white blocks) or AD criterion (black blocks), 
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Figure tf. Coding results of the wavelet video encoder, using FS-ASfAD or FS~ 
AB, and the MPBG-4 Verification Model (VM) for the 'Mobile & Calendar" 
sequence. Intra wavelet coding of all frames is also indicated. 



Figure 7. Reconstructed last frame oP'Mobile & Calendar sequence. (30,5 dB> 
0.297 bpp) 
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(a) 




(b) 

Figure 9. Coding results of the wavelet video encoder using FS-AS/AD (4 A A) for the 
"Miss America" sequence, (a) Obtained quality and (b) number of bytes per frame. The 
obtained average quality is 36.6 dB at a compression ratio of 1 13. 
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Figure Xarphone" sequence, converted Co 128x128 format 




(d) 24Kbit/*, 28.59dB (c) Ififchit/s, 27,27<fB (I) mm 25.23dB 

Figure "Carphone" sequence (128x128, ISfps) coded at the respective bitrates 
and resulting in the indicated average qualities. 
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METHOD AND SYSTEM FOR VIDEO 
COMPRESSION 

RELATED APPLICATIONS 

This application claims priority to U.S . Provisional Appli- 5 
cation No. 60/126,059, filed on Mar. 25, 1999. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention Q 
The invention relates to video compression techniques. 

2. Description of the Related Art 

A video information stream comprises of a sequence of 
video frames. Each of the video frames can be considered as 
a still image. The video frames are represented in a digital 15 
system as an array of pixels. The pixels comprise of lumi- 
nance or light intensity and chrominance or color informa- 
tion. The light and color information is stored in a memory 
of the digital system. For each of the pixels some bits are 
reserved. From a programming point of view each video 20 
frame can be considered as a two-dimensional data type. 
Note that fields from an interlaced video sequence can also 
be considered as video frames. 

In principle when the video information stream must be 
transmitted between two digital systems, this can be realized 25 
by sending the video frames sequentially in time, for 
instance by sending pixels and thus bits sequentially in time. 

There exist however more elaborated transmission 
schemes enabling faster and more reliable communication 
between two the digital systems. The transmission schemes 30 
are based on encoding the video information stream in the 
transmitting digital system and decoding the encoded video 
information stream in the receiving digital system. Note that 
the same principles can be exploited for storage purposes. ^ 

During encoding, the original video information stream is 
transformed into another digital representation. The digital 
representation is then transmitted. The goal of decoding is to 
reconstruct the original video information stream from the 
digital representation completely when lossless compression 4Q 
is used or approximately when lossy compression is used. 

The encoding is based on the fact that temporal nearby 
video frames are often quite similar up to some motion. The 
arrays of pixels of temporal nearby video frames often 
contain the same luminance and chrominance information 45 
except that the coordinate places or pixel positions of the 
information in the arrays are shifted or displaced. Shifting or 
displacement in position as function of time defines a 
motion. The motion is characterized by a motion vector. 
Note that although the described similarity up to some so 
motion of video frames appears only in ideal cases, it forms 
the basis of encoding based on a translational motion model. 
The transformation between a video frame and a temporal 
nearby video frame can also be a more complicated trans- 
formation. Such a complicated transformation can form the 5S 
basis of a more complicated encoding method. 

Encoding of the video information stream is done by 
performing encoding of the video frames of the sequence 
with respect to other video frames of the sequence. The other 
video frames are denoted reference video frames. go 

The encoding is in principle based on motion estimation 
of the motion between a video frame under consideration 
and a reference video frame. The motion estimation defines 
a motion vector. Motion estimation is based on calculating 
an error norm which is determined by a norm of the 65 
difference between two video frames. Often the sum of 
absolute differences of pixel values of pixels of the reference 
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frame and the video frame under consideration is used as 
error norm. Other error norms can also be used. In the prior 
art essentially all error norms are based on differences 
between pixel values of pixels of both frames. 

After the motion is estimated, motion compensation is 
performed. The motion compensation comprises of con- 
structing a new motion compensated video frame from the 
reference video frame by applying the motion, defined by 
the motion vector. The motion compensated video frame 
comprises of the pixels of the reference video frame but 
located at different coordinate places. The motion compen- 
sated video frame can then be subtracted from the video 
frame under consideration. This results in an error video 
frame. Due to the temporal relation between the video 
frames, the error video frame will contain less information. 
This error video frame and the motion vectors are then 
transmitted, possibly after some additional coding. The 
motion estimation, motion compensation, subtraction and 
additional coding is further denoted by interframe encoding. 

The interframe encoding can be limited to a part of a 
video frame. The interframe encoding is also not performed 
on the video frame as a whole but on pieces of the video 
frame. The video frame is divided into non-overlapping or 
even overlapping blocks. The blocks define a region in the 
video frame. The blocks can be of arbitrary shape. The 
blocks can be rectangular, triangular, hexagonal or any other 
shape, regular and irregular. 

The blocks are thus also arrays of pixels but of smaller 
size than the video frame array. The interframe encoding 
operations are then performed on essentially all the blocks of 
the video frame. As the encoding of a video frame is 
performed with respect to a reference video frame, implicitly 
a relation is defined between the blocks of the video frames 
under consideration and the blocks of the reference video 
frame. Indeed the calculation of the sum of absolute differ- 
ences or any other error norm will only be performed for a 
block of a video frame under consideration and blocks of the 
reference video frame which are nearby located. These 
locations are defined by the maximum length of the motion 
vector. These locations define a search area. These locations 
are defined by the minimum and maximum component 
values of the motion vector. In case of a pure translational 
motion model the minimum and maximum component val- 
ues correspond to the search ranges. The resulting locations 
define the search area in the reference video frame. 

Wavelets have proven to be successful in compressing 
still images. Compared to the classical DCT approach 
(JPEG), the wavelet-based compression schemes offer the 
advantage of a much better image quality obtained at very 
high compression ratios. Still image compression via the 
wavelet transform leads to graceful image degradation at 
increased compression ratios, and does not suffer from the 
annoying blocking artefacts, which are typical for JPEG at 
very low bit rates. Another advantage of wavelets over DCT 
is the inherent multiresolution nature of the transformation, 
so that progressive transmission based on scalability in 
quality and/or resolution of images comes in a natural way. 
These advantages can be efficiently exploited for sequences 
of video frames, especially in very low bit rate applications 
that can benefit from the improved image quality. Moreover, 
the progressive transmission capability is important to sup- 
port variable channel bandwidths. 

A straightforward approach to build a wavelet-based 
video codec, is to replace the DCT in a classical video coder 
by the discrete wavelet transform [Dufaux F, Moccagatta I. 
and Kunt M. "Motion-Compensated Generic Coding of 
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Video Based on a Multiresolution Data Structure". Optical 
Engineering, 32(7):1559-1 570, 1993.][Martucci S., Sodagar 
I. and Zhang Y-Q. "A Zerotrec Wavelet Video Coder". IEEE 
Trans, on Circ. and Syst. for Video Techn., 7(1):109-118, 
1997.]. A drawback of this implementation is that for 
interframe encoding the wavelet transform is applied to the 
complete error video frame, which contains blocking arte- 
facts. These artificial discontinuities, introduced in the 
motion vector field, lead to undesirable high-frequency 
subband coefficients that reduce the compression efficiency. 

To avoid this limitation, the discrete wavelet transform is 
taken out of the temporal prediction loop which results in the 
video encoder depicted in FIG. 1 [Zhang Y.-Q. and Zafar S. 
"Motion-Compensated Wavelet Transform Coding for Color 
Video Compression". IEEE Trans, on Circ. and Syst. Video 
Techn., 2(3):285-296, 1992.]. Before the motion (ME) esti- 
mation and motion compensation (MC), the discrete wavelet 
transform (DWT) is calculated on the video frames, obtain- 
ing for each of the video frames an average sub image and 
detail subimages (FIG. 12). 

Both the motion estimation and compensation are per- 
formed in the wavelet transform domain, i,e v in the average 
subimage of the highest level and in the detail subimages. 
This is feasible since the wavelet subimages contain not only 
frequency information but also spatial information, which is 
not the case for the DCT. The advantages of such a codec 
are: (1) the blocking artefacts due to the motion vector (MV) 
field are no longer transformed to the wavelet transform 
domain and (2) no inverse discrete wavelet transform 
(IDWT) is needed, so that from an implementation point of 
view, both hard- and software, the encoder can be simplified. 

However, difficulties are encountered with this approach, 
because in general the discrete wavelet transform is not shift 
invariant [Cafforio C, Guaragnella C. and Picco R. "Motion 
Compensation and Multiresolution Coding". Signal Proa: 
Image Communication, 6:123-142, 1994.], due to the sub- 
sampled nature of the transform. This implies that shifts in 
the spatial domain do not just produce shifts in the wavelet 
transform domain subimages, but change the pixel values of 
the coefficients in the subimages as well. Motion estimation 
and compensation are not as simple as in the spatial domain, 
where blocks are taken out of the reference video frame and 
are used to predict the considered video frame. In the 
wavelet transformed video frames the required blocks are 
not directly available, therefore one cannot use the same 
techniques as in the spatial domain. However, there is an 
exception if the shifts in the spatial domain are multiples of 
the sampling period. A dyadic wavelet transform is com- 
pletely shift invariant if the spatial domain shift has the form 
d-2 7 , deZ, where J denotes the number of decomposition 
levels (see FIG. 12). In this case, the same motion estimation 
and compensation approaches can be used in the wavelet 
transform domain and the spatial domain. 

Some methods have already been introduced in [Mandal 
M. K., Chan E. and Panchanathan S. "Multiresolution 
Motion Estimation Techniques for Video Compression" 
(preprint)], [Zhang Y- Q. and Zafar S. "Motion- 
Compensated Wavelet Transform Coding for Color Video 
Compression". IEEE Trans, on Circ. and Syst. Video Techn., 
2(3):285-296, 1992]. They perform a hierarchical motion 
estimation in the wavelet detail subimages by using the 
mean absolute difference error (MAE), or the mean square 
difference error (MSE) as an error norm of the difference 
between two video frames or video frame blocks. To obtain 
the wavelet error video frame, the new motion compensated 
wavelet video frame is subtracted from the considered 
wavelet video frame (see FIG. 1), just as one would do in the 
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spatial domain. However, since spatial shifts produce 
ambiguous effects in the wavelet domain, one must conclude 
that new methods are required for motion estimation and 
compensation in the wavelet transform domain. 

As a conclusion it can be stated that the motion estimation 
and compensation methods are based on subtracting the 
motion compensated video frame from the reference video 
frame for creating the error video frame and that the motion 
vector estimation is based on differences between pixel 
values. 

SUMMARY OF THE INVENTION 

A method and system for video compression, compatible 
with and exploiting the characteristics of a state-of-the-art 
image transformation in the compression, is presented. In 
the method and the system a plurality of error norms are 
exploited, the error norms being intrinsically related to the 
characteristics of the state-of-the-art image transformation. 

The invention is illustrated for video compression tech- 
niques based on a translational motion model, thus exploit- 
ing motion estimation and compensation, but is not limited 
hereto. 

The invention is further illustrated with the wavelet 
transformation as image transformation but is not limited 
hereto. 

In the first aspect of the invention, the determination of 
the motion vector of a block of a video frame under 
consideration with respect to a reference video frame is 
determined by exploiting a plurality of sets of error norms. 
The determination of the error norms within one set is done 
by calculating the norm of an error which is given by a 
function, characteristic for the set, of the pixel values of the 
block of the video frame and pixel values of the reference 
video frame but for different positions of the block with 
respect to the reference video frame. Each set corresponds to 
a different function. 

In a first embodiment of this aspect of the invention, the 
norms are calculated for weighted sums of pixel values, the 
weighted sums are characterized by a weighting vector. The 
norms of different sets correspond to different weighting 
vectors. 

In a second embodiment of this aspect of the invention, 
one set of error norms is based on summing pixel values and 
another on subtracting pixel values of the block under 
consideration and the reference video frame. Both sets are 
exploited in the determination of the motion vector. 

In a third embodiment of this aspect of the invention, the 
motion vector of a block of a video frame with respect to a 
reference video frame is determined by exploiting two sets 
of error norms, the first is based on the sum of absolute 
differences of pixel values and the second on the sum of 
absolute sums of pixel values. 

In a fourth embodiment of this aspect of the invention, the 
motion vector of a block of a video frame with respect to a 
reference video frame is determined by exploiting two sets 
of error norms, the first is based on the sum of squared 
differences of pixel values and the second on the sum of 
squared sums of pixel values. 

In a fifth embodiment of the first aspect of the invention 
the video frame and the reference video frame contain 
wavelet transformed subimages. The prediction error of the 
detail subimages can be reduced if one considers both 
summing and subtracting the original and the predicted 
blocks [Van der Auwera G., Munteami A., Lafruit G., 
Cornelis J. "Video Coding Based on Motion Estimation in 
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the Wavelet Detail Images". Proceedings of the International As an example, when the motion vector is determined by 

Conference on Acoustics, Speech, and Signal Processing using a sum of absolute differences, then the error block 

(ICASSP), pp. 2801-2804, Seattle, May 1998.], [Van der must be summed with the motion compensated block. When 

Auwera G., Munteanu A., Lafruit G., Cornells J. "A New the motion vector is determined by using a sum of absolute 

Technique for Motion Estimation and Compensation of the 5 sums, then the error block and the motion compensated 

Wavelet Detail Images". Eusipco, Rhodos, September block must be subtracked. 

1998.]. [ n an aspect of the invention it is recognized that one 

In a second aspect of the invention a system is presented needs at the decoding peer information on how the motion 

for encoding a sequence of video frames, exploiting tem- vector has been determined, thus which functions has been 

poral redundancy, via motion estimation and compensation *o used for calculating the minimal error. Therefore an identi- 

techniques, using a plurality of sets of error norms in the fier is introduced which is used in the decoding methods and 

motion estimation and a plurality of operations for error decoding system, for selecting the appropriate operations for 

block determination that are compatible with each set of reconstruction of the block. In an aspect of the invention it 

error norms. is recognized that besides the traditional stream of 

In a first embodiment of the second aspect of the invention 15 information, such as an encoded error block and motion 

both summing and taking differences of pixel values are vector, further the extra information, embedded in the iden- 

exploited for determination of the motion vector. Further tifier must be transmitted. 

summing of pixel values or taking differences of pixel values One embodiment of the invention presents a method and 
are exploited in determination of the error block. The system system for video compression, compatible with and exploit- 
comprises of dedicated circuitry for performing image 20 ing the characteristics of state-of-the-art image transforma- 
transformation, motion estimation, motion compensation tions. 
and construction of the error video frame, the motion 

estimation, exploiting both summing and taking differences BRIEF DESCRIPTION OF THE DRAWINGS 

of pixel values, the construction of the error video frame also $iq 1, (prior-art) Video encoder, being a cascade of a 

exploiting either summing and taking differences of pixel 25 waV elet transform circuit (DWT, being a discrete wavelet 

values, transform circuit), an interframe encoding circuit and an 

In a further embodiment the image transformation is a extra encoding circuit, the second circuit comprising of an 

J-level wavelet transformation. error video frame encoding circuit, an error video frame 

In a further embodiment the summing and taking differ- 3Q decoding circuit, a motion estimation circuit (ME) and a 

ences of pixel values are performed by separate circuits. motion compensation circuit (MC), the motion estimation 

In a further embodiment the system comprises of a frame circuit, exploiting as estimation criterion the sum of absolute 

encoding circuit (FIG. 11, (30)) for encoding the error video differences, the error video frame being constructed as the 

frame, and a frame decoding circuit (FIG. 1) for decoding difference between the motion compensated block of the 

the error video frame. Furthermore, extra encoding can be 3S reference video frame and the block of the video frame 

provided at the output of the interframe encoding loop (FIG. under consideration. The video encoder can also contain 

11 (140)) some buffers (BUF). With INTER is meant interframe 

In a second embodiment of the second aspect of the ^in& thus exploiting the temporal redundancy With 

invention the system is adapted for performing image INTR ^ is meant intraframe coding, thus coding of each 

transformation, motion estimation, motion compensation 40 vldeo frame se P aratel y- 

and construction of the error video frame, by using the FIG. 2. AD and AS as a function of s and k-1, for the 

motion estimation, exploiting both summing and taking biorthogonal filters (2.4) and (5.5). 

differences of pixel values, the construction of the error FIG. 3. AD and AS as a function of s and k=l, for the 

video frame also exploiting either summing and taking biorthogonal filters (9.7). 

differences of pixel values. The system can be either a 45 FIG. 4. "Mobile & Calendar" sequence, converted to 

general purpose processor or a dedicated circuit or a com- 256x256 format. 

bination of both. FIG. 5. The FS-AS/AD (2,4,8) method attains a minimum 

In a third aspect of the invention a system is presented for with the AS criterion (white blocks) or AD criterion (black 

decoding a sequence of video frames, being encoded by blocks). 

exploiting temporal redundancy, via motion estimation and 50 FIG. 6. Coding results of the wavelet video encoder, using 

compensation techniques, using a plurality of sets of error FS-AS/AD or FS-AD, and the MPEG-4 Verification Model 

norms in the motion estimation and a plurality of operations (yj^ for the "Mobile & Calendar" sequence. Intra wavelet 

for error block determination that are compatible with each coding of all fraraes individually is also indicated, 

set of error norms. U» system inputs or loads an error block Ra ? Rcconstructed Iast framc of « Mobile & Calendar" 

and performs a decoding operation on the error block. T^e 55 nce (30 5 dB 0 297 b } 

system also performs a motion compensation of a block of ^, T ^ a \ _. . .. _ , ~~ 

a reference video frame. Note thai the reference video frame * Miss Anuaa se 1 uence ' converted t0 255x256 
can be a stored image, being an image transmitted earlier, or 

just a previous received image. Said motion compensation is p IG. 9 - Coding results of the wavelet video encoder using 

based on an inputted motion vector. The motion vector is 60 FS-AS/AD (4,4,4) for the "Miss America" sequence, (a) 

determined by one of a plurality of sets of error norms, each Obtained quality and (b) number of bytes per frame, 

of the sets being related to a substantially different function FIGS. 10A&B. "Carphone" sequence (128x128, 15 fps) 

of pixel values. Based on the motion compensated block of coded at the respective bitrates and resulting in the indicated 

the reference video frame and the decoded error block, a average qualities. 

block of a video frame is determined with operations being 65 FIG. 11. Video encoder, being a cascade of a wavelet 

compatible with the function of pixel values used for deter- transform circuit (10) and a interframe encoding circuit 

mining the motion vector. (110), the latter circuit comprising of a frame encoding 
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circuit (30), a frame decoding circuit (40), a motion estima- while the highpass filter g is symmetric around n=-l. 
tion circuit (70) and a motion compensation circuit (80), the Consider that g(n) has 2N+1 coefficients, and introduce the 
motion estimation circuit, exploiting two estimation error notation g(n)=g(n-l). The highpass component obtained 
norms, one of the error norms exploiting summing of pixel from a one level wavelet analysis of x(n) is given by: 
values, the other exploiting difference of pixel values, the 5 

error video frame being constructed as either the difference n 

between or the sum of the motion compensated block of the * f («) = ?(0)jt(2/i + 1) + £?(p)[*(2/i + 1 - p) + x(2n + 1 + p)l 

reference video frame and the block of the video frame ^ 
under consideration in a circuit (20). The estimated error 
norms can be sums of absolute sums of pixel values and the 10 

sum of absolute differences between pixel values. The Denote by x^(n-s) the signal obtained by shifting with s 
motion estimation circuit can comprise of two parallel positions the wavelet component x^(n), and by y(n) the 
circuits (90) and (100), each of the circuits, determining one signal obtained by shifting with k positions the original 
of the estimation error norms. The wavelet transform circuit signal x(n): y(n)=x(n-k). The highpass component of a one 
can even so be any other image transformation circuit. The is level wavelet analysis of y(n) is y^n). If k is even, it is 
circuit (110) can further comprise of buffers, like (60), block known from prior art that the one level wavelet transform is 
summing or subtracking blocks like (50) and various con- shift invariant, therefore we obtain a zero prediction error if 
nections between the circuits (20), (30), (40), (50), (60), we subtract the original samples yjn) and the predicted 
(70), (80). The motion estimation circuit (70) produces the samples x^(n-k/2). Conversely, if k is odd, the absolute sum 
motion vectors (190), being consumed by the motion com- 20 between the predicted samples x^n-s) and the original 
pensation circuit (80) and also transmitted (130) together samples y g (n) is lower than the absolute difference, for 
with the encoding video frame (120). Note that after the specific values of s. We show this in the following for the 
video encoder extra encoding circuits (140) can be placed. particular case k-1 and s-1. 

Note that alternatively to summing in block (20) and sub- [t ^ easy to prove that the general case of an odd shift k 
tracking in block (50) the motion compensation block can 25 can bc restr i ct ed to the particular case k-1. We will assume 
perform an inversion, [n that case the block (20) has as ksal ia me remainder 0 f this section, which leads to the 
functionality subtrackting while the block (50) has as func- hig hpass component y» given by: 
tionality summing. 

FIG. 12. A J-level wavelet transform image with J-3. n 
FIG, 13. Video decoder (290), comprising of a frame 30 = J(0)-*M + M2n - /») + *<2n + 

decoding circuit (210), a motion compensation circuit (220) ps 
and a summing/difference circuit (230). The motion com- 
pensation circuit has as input the motion vectors (270) and 

the reference frame (280). The summing/difference circuit w * denote by AD and AS the absolute difference respec- 
exploits either summing of an error block from the error 35 tivcly the absolute sum between the shifted wavelet corn- 
video frame (260) with the motion compensated block of the P onent x >" s ) aod v »i lhe expressions of AD and AS for 
reference video frame (280), stored in storage unit or buffer St= *- are: 
(250) or subtracting the motion compensated block from the 

corresponding error block, depending on the motion esti- *D = £lx f (/i-l)-y,(n)| = ^d(n), 

mation criteria, exploited in the video encoder. Note that the « « 

frame decoding circuit (210), the motion compensation as=Y \x g (n-\) + y t (n)\ = Y s(n). 

circuit (220) and the summing/difference circuit (230) have if * ' n 

essentially the same functionality as their respective coun- 
terpart (40), (80) and (50) in the encoder of FIG. 11. The 

video decoder (290) can be preceded with an extra decoding 45 Taking into account that g(n)«g(-n), we derive: 
circuit (200), with an inverse functionality of the circuit 
(140) of the encoder. The video decoder can further com- 
prise of an inverse wavelet transform circuit (240). The rf(n) = 
cascade of decoding circuit (200) and decoding circuit (210) 
can be a single decoding circuit. Again the alternative 50 
approach of performing inversions in the motion compen- s(n) = 
sat ion block can be exploited here. 



Yj QKP - 1 J + *EA>)> ' W2n - 1 + p) + x{7n - p)) 



DETAILED DESCRIPTION OF THE 

INVENTION 55 where g(N+l)=0. Since the input signal x(n) is a step 

function, it can be proven that d(n)«d(-n)=|g(2n)|, for every 
First an analysis in a 1 -dimensional case of a wavelet va | ue n ver jf v j n o 0^n^rN/2l. 



The expression for s(n) is: 



transformed function and the effects of shifting in position 
are presented. 

The detail images of wavelet transformed images contain $o 
high frequency information which corresponds mainly to *(«) ?(2«)4- 2- £ ?(p),v«eAM)*ns[w/2]. 

edges in the spatial domain. To facilitate the calculations, the p*\+i* 
analysis is restricted to the one -dimensional case, and an 
arbitrary edge is modeled by a step profile. 

Denote by h and g the filters used to perform a one- 65 K n<-[N/2] or n>[N/2], we can show that s(n>d(n)=0. 
dimensional biorthogonal wavelet analysis of the step func- Finally, the absolute difference and the absolute sum are 
tion x(n). The lowpass filter h is symmetric around n-0, given by: 
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[N/2] 



AD = |*(-l)| + 2> i |g(2«)|, 



^1 



TABLE 1 



10 . 



The values of the absolute sum and absolute difference 
for different biorthoRonal filters T (s c 3). 



Filters 


AS 


AD 


Biorthogonal (2.4) 


0.0000 


0.3535 


Biorthogonal (2.8) 


0.0000 


0.3535 


Biorthogonal (3.9) 


0.1768 


0.3535 


Biorthogonal (5.5) 


0.1772 


0.5459 


Biorthogonal (6,8) 


0.1315 


0.4342 


Biorthogonal (9.7) 


0.0883 


0.4349 



The values of AD and AS are evaluated for different 
biorthogonal filter banks. As we note from Table 1, the 
absolute sum is smaller than the absolute difference for all 
the considered filters. We observe also that AS is zero for the 
first two filters. Hence, a zero prediction error can be 
obtained if the filter coefficients satisfy the constraint: 



N-l 
p=7n 



= 0. 



FS-AD method (full-search using AD) in the section report- 
ing the experimental results. 

We propose a motion estimation algorithm that performs 
full-search motion estimation on every level of the wavelet 
decomposition and implements two matching criteria for 
finding the best block, namely AS and AD. The block sizes 
on every level and the search ranges arc specified as in the 
FS-AD method. Due to its lowpass nature, in the average 
image we use only AD as matching criterion. 



15 



20 



25 



30 



35 



Similar calculations are made to derive AD and AS for 
s*l(k«l). For all the tested filters, the minima of the 
absolute sum are reached in s=l, and they are smaller than 
the minima of the absolute difference. An example is given 
in FIG. 2, that depicts AD and AS as a function of s, for the 
biorthogonal filters (2.4) and (5.5). The same conclusion can 
be formulated from FIG. 3 in the case of the biorthogonal 40 
filter (9.7). It results that the smallest prediction error is 
attained if y^(n) is predicted from x g (n-l) by using the AS 
criterion. 

Above we have shown that for odd shifts of the step 
function a small or even zero prediction error can be found 45 
if the predicted wavelet coefficients are summed to the 
original coefficients. If the shift is even, then we have to 
subtract them to get a zero error. Further we describe an 
algorithm that performs motion estimation in the wavelet 
detail images by using two matching criteria, namely AD 50 
and AS. We will compare the resulting prediction error of 
our algorithm with the minimal error that can be reached by 
just using AD as a matching criterion. 

The FS with AD method- performs Full -Search motion 
estimation on every level of the wavelet decomposition by 
using AD as error criterion, and calculates the error image by 
subtracting the predicted wavelet image from the original 
image, 

In our simulations, we use a 3 -levels wavelet 
decomposition, so the full-search motion estimation is per- 
formed in the four sub images of level 3 and in the six 
subimages of levels 2 and 1. To define the block sizes in the 
detail images we use two different approaches. In the first 
one we impose the same block size in any detail image, 
while in the second one we use dyadic block sizes containing 65 
2 c ~ y x2 c_; coefficients, where j denotes the decomposition 
level and c is a constant. We identify this algorithm as the 



55 



60 



In the FS-AD method, the motion vector is determined by 
the position of the block in the reference image that mini- 
mizes AD. If we also calculate AS for every search position 
in the reference image, then it is possible that the minimum 
obtained with the AS criterion is smaller than the minimum 
given by the AD criterion. If this is the case, then the motion 
vector is determined by the position of the block in the 
reference image where AS is minimal. Conversely, if the 
minimum of AD is the smallest, then the motion vector will 
be the same as for the FS-AD method. We deduce that this 
method yields a smaller prediction error than the FS-AD 
method. We refer to this algorithm as the FS -AS/AD method 
(full-search using AS and AD). 

The arithmetic complexity of FS-AS/AD is compared to 
FS-AD [Van der Auwera G., Lafruit G., Cornelis J. "Arith- 
metic Complexity of Motion Estimation Algorithms". IEEE 
Benelux Signal Processing Symposium, pp. 203-206, 
Leuven, March 1998.]. The search ranges in the detail 
images are 2 4_; pixels, with j being the decomposition level. 
Table 2 contains the number of arithmetic operations for the 
motion estimation process using 256x256 images. It follows 
that the FS-AS/AD method takes twice the number of 
operations of the FS-AD method, because it makes use of 
two matching criteria in parallel. We can also compare 
FS-AS/AD to the equivalent full search method in the spatial 
domain, i.e. level 0 of the wavelet decomposition or the 
original image. If we use an equivalent search range of 16, 
then we obtain 214.1 million operations, so more than two 
times the amount of FS-AS/AD. 

The arithmetic complexity determines the necessary 
hardware, but it is not the only factor to take into consid- 
eration. If one also considers energy dissipation, then the 
memory transfers will be the dominant factor. E.g. according 
to [Gordon B., Tsern E., Meng T "Design of a Low Power 
Video Decompression Chip Set for Portable Applications". 
Journal of VLSI Signal Processing Systems, no. 13, 1996.], 
an external memory access consumes approximately 16000 
pj compared to 7 pJ for an addition. 

Without any memory optimization, the calculation of one 
AS or AD criterion takes two external read operations from 
memory. Table 2 contains the number of memory transfers 
for the FS-AS/AD and the FS-AD method. It follows that the 
transfer amount is the same for both methods, since the AS 
and AD criterion can be calculated simultaneously while 
requiring only two memory reads. The energy cost for both 
methods, shown in Table 2, is approximately equal. Hence, 
we conclude that the extra arithmetic complexity of the 
FS-AS/AD method with respect to FS-AD is negligible if 
one considers energy dissipation, since the number of 
required memory transfers does not significantly increase. 
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blocks. This is illustrated in FIG. 5 which shows all blocks 
TABLE 2 i n the wavelet detail images. A block is drawn in white if the 

AS criterion reaches the lowest minimum or in black if the 



Arithmetic and memory transfer complexity of the FS-AS/AD AD criterion attains the lowest minimal value. 
and FS-AD methods. (Image size - 256 x 256). $ Tq asscss the gain 0 t> taulc d by performing motion 

operations TWer* Energy Cost estimation and compensation, we also coded the sequence 

(lo 6 ) (io e ) (io e pJ) on a fr ame hy frame basis, i.e. complete intra frame coding 

of the sequence using the wavelet transform. The results are 



FS-AS/AD 91.6 30.6 490.2 illustrated in FIG. 6. 

FS ' AP 489 - 9 1Q Inter wavelet coding by using FS-AD or FS-AS/AD 

compared to intra wavelet coding, yields a considerable 

To assess the performance of the FS-AS/AD method, we quality gain for the same number of bits per frame. The 

have implemented a software simulation of the wavelet average gain attained by the worst FS-AD method, i.e. 

encoder architecture depicted in FIG. 11. We have chosen ( 8 > 8 > 8 )» is lA dB For ( 4 > 4 > 4 ) method, this 

the biorthogonal (9,7) wavelet filters to generate a 3 levels 1<; is the best AD method, the average gain is 2.7 dB. If we 

pyramidal image structure for the motion estimation pro- 15 com P are the quality gains of the FS-AS/AD methods to intra 

cess. This choice is inspired by the fact that these filters in 7*^°^' thenwe calculate aD avTO 

i -j *u u . J u f u ♦ u* 1.7 dB for the worst method, i.e. FS-AS/AD (8,8,8), and 3.2 

general provide the best coding results for photographic dfi fof j^,^ (4 4 4) w ' hich fe the best ^ H ' we 

images Moreover, in Table 1 we have shown that for an odd ^ 0Uf yideo encoder achicvcs a consid . 

shift of the step function the prediction error obtained by the 2Q efab]e quaHty gain by performing motion estimation in the 
AS criterion is very low. wavelet domain, compared to intra wavelet transform cod- 

Hie coding results are obtained for eight frames of the ^ Moreover, performing motion estimation in the wavelet 
gray-scale "Mobile & Calendar" sequence, which we have detail images by using both the absolute sum and the 
converted to the 256x256 format. This is an ISO class C absolute difference as block matching criteria in the FS-AS/ 
sequence, meaning high spatial detail and medium amount 25 AD method, results in a quality gain that varies between 0.3 
of movement. FIG. 4 depicts the first frame. an d 0.5 dB compared to the FS-AD method, which only uses 

To situate the coding performance of our wavelet video me absolute difference. In this way the FS-AS/AD (4,4,4) 
encoder, we compare it with the October '97 MPEG-4 method gets close to the quality curve of the VM, but does 
Verification Model (VM) [ISO/IEC JTC1/SC29/WG11 not surpass it. This is due to the restriction that we impose 
N1902. "Information Technology— Coding of Audio- Visual 3Q the same number of bits for every frame as for VM. By using 
Objects: Visual". International Organization for Standard- our own b it allocation we are able to exceed the VM curve, 
ization (ISO), Fribourg, October 1997] which we put in This is shown in FIG. 6 by the "FS-AS/AD (4,4,4) bit 
unadvanced motion estimation mode. In this mode the allocation" curve. We see that this curve is slightly above the 
encoder performs motion estimation with half pixel accu- VM curve for the inter wavelet coded frames. Moreover, the 
racy and uses 16x1 6 blocks. Since we have not implemented 35 mu - a wavelet coded image is approximately 1.5 dB above 
B-frames in our wavelet video encoder, the frame interde- the intra coded DCT image. Although we changed the bit 
pendency is restricted to IPPPPPPP. Table 3 contains the allocation for this sequence, the total number of bits is still 
coding results for each frame. Since the "Mobile & Calen- the same as for VM, This indicates that our wavelet video 
dar" sequence is rectangular, no shape coding is required. encoder needs its own bit allocation procedure to attain an 

40 optimal rate distortion result. FIG. 7 depicts the reconstruc- 
TABLE 3 tion of the last frame of this sequence. 

™~ The coding results are obtained for eight frames of the 

mpeg-4 vm cri|«»tab * e 2S " M ° bfle & alen<hr " gray-scale "Miss America" sequence, which we have con- 

verted to the 256x256 format. This is an ISO class A 
45 sequence, meaning low spatial detail and low amount of 
movement. FIG. 8 depicts the first frame. 

The coding results are obtained for the gray-scale "Car- 
phone" sequence, which we have converted to the 128x128 
format. FIG. 10A depicts the first frame. We have coded this 
50 sequence at different bit rates while maintaining 15 fps. FIG. 
10B shows for each bitrate one frame from the reconstructed 
sequences together with the average quality. 
What is claimed is: 

1. A method of determining the motion vector of a block 
While coding the sequence with our wavelet video 55 of a video frame with respect to a reference video frame, the 
encoder, we impose an identical number of bits per pixel v ^eo frame and reference video frame comprising pixels, 
(bpp) for each frame as for VM. This allows us to compare wherein each pixel has a pixel value, the method compris- 
the reconstructed quality, expressed by PSNR values, to the in 8 : 

quality of VM. determining a plurality of sets of error norms, wherein 

We compare. FS-AS/AD to FS-AD for different block 60 each error norm within one of the sets is related to a 
sizes, denoted by e.g. (2,4,8) representing 2x2 wavelet different position of the block in the reference video 

coefficients on decomposition level 3, 4x4 on level 2 and frame, wherein the error norms are calculated by a 

8x8 on level 1. We use identical search ranges for these norm of an error which is given by functions of the 

motion estimation algorithms, i.e. [-2,2] on level 3, [-4,4] pixel values of the block of the video frame and the 

on level 2 and [-8,8] on level 1. Experiments show that for 65 reference video frame, and wherein each set of the 
the "Mobile & Calendar" sequence AS reaches a smaller plurality of sets is related to a different function; and 

minimum than AD for more than half of the total number of selecting the motion vector with the smallest error norm. 



Frame 


Image 


PSNR 




No. 


Type 


(dB) 


bpp (texture) 


0 


I 


31.90 


0.8597 


1 


P 


30.57 


0.2974 


2 


P 


30.57 


0.2890 


3 


P 


30.57 


0.2959 


4 


P 


30.50 


0.3028 


5 


P 


30.57 


0.2955 


6 


P 


30.50 


0.3063 


7 


P 


30.50 


0.2959 * 
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2. The method of claim 1, wherein the norms are calcu- 
lated for weighted sums of the pixel values of the block of 
the video frame and the reference video frame, wherein the 
weighted sums are characterized by a weighting vector, and 
wherein each set of the plurality of sets are related to a 5 
different weighting vector. 

3. The method of claim 1, wherein the error norms of the 
first set are the sum of absolute differences between pixel 
values of the block of the video frame and the reference 
video frame, and wherein the error norms of the second set 
are the sum of absolute sums of pixel values of the block of 10 
the video frame and the reference video frame. 

4. The method of claim 1, wherein the error norms of the 
first set are the sum of squared differences between pixel 
values of the block of the video frame and the reference 
video frame; and wherein the error norms of the second set ^ 
are the sum of squared sums of pixel values of the block of 
the video frame and the reference video frame. 

5. The methods of claim 1, wherein the video frame and 
the reference video frame contain J-level wavelet trans- 
formed images. 2Q 

6. A system for encoding a sequence of video frames, the 
video frames comprising pixels, wherein each pixel has a 
pixel value, and wherein the video frames are divided in 
blocks, the system comprising: 

a first circuit for transforming of the video frames from a 
first representation to a second representation; 25 

a second circuit for performing motion estimation for 
blocks of the video frames with respect to a reference 
video frame, wherein the motion estimation exploits a 
plurality of sets of error norms, wherein the error norms ^ 
are a norm of an error which is determined by functions 
of the pixel values of the block of the video frame and 
the reference video frame, and wherein each set of the 
plurality of sets is related to a different function; 

a third circuit for performing motion compensation of 35 
blocks of the video frames with respect to the reference 
video frame; and 

a fourth circuit, for determining an error block from two 
of the blocks, exploiting operations on the blocks, 
wherein each of the operations is compatible with one 4Q 
of the functions. 

7. The system of claim 6, wherein: 

the plurality of sets of error norms comprise a first set of 
error norms and a second set of error norms; 

the error norms of the first set exploits summing of pixel 45 
values of pixels of video frames; 

the error norms of the second set exploits taking differ- 
ences of pixel values of pixels of video frames; and 

the operations on the blocks being either, summing of 
pixel values or taking differences of pixel values of the 50 
blocks. 

8. The system of claim 6, wherein the first circuit is 
adapted for performing a J-level wavelet transformation. 

9. The system of claim 7, wherein the second circuit 
comprises: 55 

a fifth circuit for determining the first error norms; and 
a sixth circuit for determining the second error norms. 

10. The system of claim 7, wherein the system further 
comprises: 

a seventh circuit for frame encoding the error block and 60 
an eighth circuit for frame decoding the error block. 

11. A system for encoding a sequence of video frames, the 
video frames comprising pixels, wherein each pixel has a 
pixel value, the video frames being divided in blocks, the 
system is adapted for: 65 

transforming the images on each of the video frames from 
a first representation to a second representation; 
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performing motion estimation for blocks of the video 
frames, the motion estimation for blocks of the video 
frames with respect to a reference video frame exploit- 
ing a plurality of sets of error norms, the error norms 
being calculated by norms of an error which is given by 
functions of the pixel values of the block of the video 
frame and the reference video frame, each set of the 
plurality of sets being related to a different function; 

performing motion compensation of blocks of the video 
frames with respect to the reference video frame; and 

determining an error block from two of the blocks. 

12. The system of claim 11, wherein the system deter- 
mines at least part of the error norms in parallel. 

13. Asystem for decoding a sequence of video frames, the 
video frames comprising pixels, wherein each pixel has a 
pixel value, and wherein the video frames being divided into 
blocks, the system comprising: 

a first circuit for inputting and decoding an error block; 

a second circuit for performing motion compensation of a 
block of a reference video frame, based on an inputted 
motion vector, the motion vector being determined by 
one of a plurality of sets of error norms, each of the sets 
being related to a substantially different function of 
pixel values; and 

a third circuit for determining a block of a video frame 
from the motion compensated block of the reference 
video frame and the error block and with operations 
being compatible with the function of pixel values used 
for determining the motion vector. 

14. The system of claim 13, wherein: 

the plurality of sets of error norms comprising a first set 
of error norms and a second set of error norms; 

the error norms of the first set exploiting summing of pixel 
values of pixels of video frames; 

the error norms of the second set exploiting taking dif- 
ferences of pixel values of pixels of video frames; and 

the operations for determining the block of a video frame 
from the motion compensated block of the reference 
video frame and the error block being either summing 
of pixel values or taking differences of pixel values of 
the error block and the motion compensated block of 
the reference video frame. 

15. The system of claim 13 or 14, wherein at least the 
error block, the decoded error block are J-level wavelet 
transformed images. 

16. The system of claim 13, further comprising an inverse 
wavelet transform circuit. 

17. Asystem for decoding a sequence of video frames, the 
video frames comprising pixels, wherein each pixel has a 
pixel value, and wherein the video frames are divided into 
blocks, the system is adapted for: 

inputting and decoding an error block; 

performing motion compensation of a block of a reference 
video frame, based on an inputted motion vector, the 
motion vector being determined by one of a plurality of 
sets of error norms, each of the sets being related to a 
substantially different function of pixel values; and 

determining a block of a video frame from the motion 
compensated block of the reference video frame and 
the error block with operations being compatible with 
the function of pixel values used for determining the 
motion vector. 

18. A method of determining a block of a video frame 
from an error block, a reference video frame and a motion 
vector, the method comprising: 

inputting the error block, the motion vector, and an 
identifier attached to the motion vector; 
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performing a motion compensation of a block of the 
reference video frame with respect to the reference 
video frame, based on. the inputted motion vector; and 

determining the block of the video frame from the error 
block and the motion compensated block of the refer- 
ence video frame, thereby selecting operations on pixel 
values of the motion compensated block of the refer- 
ence video frame and the error block based on the 
identifier, wherein the motion vector is determined by 
one of a plurality of sets of error norms, wherein each 
of the sets is related to a substantially different function 
of pixel values, wherein the identifier identifies which 
function is exploited for motion vector estimation, and 
wherein the operations on pixel values of the motion 
compensated block of the reference video frame and 
the error block are compatible with the identified 
function. 

19. The method of claim 18, wherein the identifier com- 
prises at least two values, wherein a first of the two values 
is related to a motion vector determination by using the sum 
of absolute differences of pixel values, wherein a second of 
the two values are related to a motion vector determination 
by using the sum of absolute sums of pixel values, wherein 
the operations sum pixel values while the identifier is equal 
to the first value, and wherein the operations subtract pixel 
values while the identifier is equal to the second value. 

20. The method of claim 9 or 19, wherein the motion 
compensation and the determining of the block is performed 
on wavelet transformed images. 

21. The method of transmitting a block of a video frame 
from a first peer to a second peer, comprising: 

determining a motion vector for the block with respect to 
a reference video frame by using one of a plurality of 
sets of error norms, each of the sets being related to a 
substantially different function of pixel values; 

assigning a value to an identifier, based on the function 
being exploited for determining the motion vector; 

compensating the motion of the block with respect to the 
reference video frame by using the motion vector; 

determining an error block based on the . motion compen- 
sated block and the reference video frame; 

transmitting the motion vector, the identifier and the error 
block from the first peer to the second peer; 

inputting the motion vector, the identifier and the error 
block; 

compensating the motion of the motion compensated 
block with respect to the reference video frame by 
exploiting the inputted motion vector; and 

determining the block of the video frame by performing 
operations on pixel values of the motion compensated 
block from the reference video frame and the error 
block and, thereby selecting operations based on the 
identifier. 

22. The method of claim 21, wherein the identifier com- 
prises at least two values, wherein a first of the two values 
is related to a motion vector determination by using the sum 
of absolute differences of pixel values, wherein a second 
value of the two values is related to a motion vector 
determination by using the sum of absolute sums of pixel 
values, wherein the operations sum pixel values while the 
identifier is equal to the first value, and wherein the opera- 
tions subtract pixel values while the identifier is equal to the 
second value. 

23. The method of claim 21, wherein the motion com- 
pensation and the determining of the block is performed on 
wavelet transformed images. 
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24. Asystem for encoding a sequence of video frames, the 
video frames comprising pixels, wherein each pixel has a 
pixel value, and wherein the video frames are divided in 
blocks, the system comprising: 

a first circuit for transforming of the video frames from a 
first representation to a second representation; 

a second circuit for performing motion estimation for 
blocks of the video frames with respect to a reference 
video frame, wherein the motion estimation exploits a 
plurality of sets of error norms, wherein the error norms 
are a norm of an error which is determined by functions 
of the pixel values of the block of the video frame and 
the reference video frame, and wherein each set of the 
plurality of sets are related to a different function; 

a third circuit for performing motion compensation of 
blocks of the video frames with respect to the reference 
video frame; and 

a fourth circuit, for determining an error block from two 
of the blocks, wherein the determining is based upon 
the output of one of the functions. 

25. A method of determining the motion vector of a block 
of a video frame with respect to a reference video frame, the 
video frame and reference video frame comprising pixels, 
wherein each pixel has a pixel value, the method compris- 
ing: 

determining a plurality of sets of error norms, wherein 
each error norm within one of the sets is related to a 
different position of the block in the reference video 
frame, wherein the error norms are calculated by a 
norm of an error which is given by functions of the 
pixel values of the block of the video frame and the 
reference video frame, and wherein each set of the 
plurality of sets is related to a different function; and 

selecting the motion vector having the smallest error 
norm, wherein the plurality of sets of error norms 
comprises a first set of error norms and a second set of 
error norms, wherein the error norms of the first set are 
based upon differences between pixel values of the 
block of the video frame and the reference video frame, 
and wherein the error norms of the second set are based 
at least in part upon summing pixel values of the block 
of the video frame and the reference video frame. 

26. The method of claim 25, wherein the error norms are 
calculated for weighted sums of the pixel values of the block 
of the video frame and the reference video frame, wherein 
the weighted sums are characterized by a weighting vector, 
and wherein each set of the plurality of sets are related to a 
different weighting vector. 

27. The method of claim 25, wherein the error norms of 
the first set are the sum of absolute differences between pixel 
values of the block of the video frame and the reference 
video frame, and wherein the error norms of the second set 
are the sum of absolute sums of pixel values of the block of 
the video frame and the reference video frame. 

28. The method of claim 25, wherein the error norms of 
the first set are the sum of squared differences between pixel 
values of the block of the video frame and the reference 
video frame; and wherein the error norms of the second set 
are the sum of squared sums of pixel values of the block of 
the video frame and the reference video frame. 

29. The method of claim 25, wherein the video frame and 
the reference video frame comprise J-level wavelet trans- 
formed images. 
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